Computational testing of five Swahili dictionaries
نویسنده
چکیده
This paper introduces a computational method for testing dictionaries. It discusses the implementation of this method on testing five current dictionaries of Swahili and provides a number of test results. The tested dictionaries are Kamusi ya Kiswahili Sanifu (TUKI), Kamusi ya Maana na Matumizi (OUP), Modern Swahili Modern English Dictionary (MStryck), Kamusi ya Kiswahili Kiingereza (TUKI), and Swahili Suomi Swahili -sanakirja (SKS). Each of the dictionaries was tested by using a dictionary-specific version of SWATWOL, a two-level parser of Swahili. The recall of each dictionary was tested by using three test corpora. Also, the proportion of unused words in each dictionary was tested. Furthermore, the performance of each dictionary in some word classes was tested. The results of tests are summarized in tables and graphs.
منابع مشابه
Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes
Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary
متن کاملApplying Finite-State Methods to the Swahili Language
Herein, we explore the current finite-state methods that exist for analyzing English grammar and decide whether they can be applied to the Swahili language and Swahili syntactic patterns. Further, we to explore the differences between Swahili grammar and English grammar to see if it is possible to accommodate these finite-state methods to the Swahili language. In the end, the objective is to de...
متن کاملNordic Journal of African Studies 4(2): 81-92 (1995)
This paper presents some applications of SWATWOL, a morphological parser of Swahili, for information retrieval. It presents a solution to the problem of retrieving accurate linguistic information in a language, where word formation branches out from the lemma to both directions. After discussing technical problems and their solution, some research tasks that have been carried out, or which are ...
متن کاملA Repository of Free Lexical Resources for African Languages: The Project and the Method
We report on a project which we believe to have the potential to become home to, among others, bilingual dictionaries for African languages. Kept in a well-structured XML format with several possible degrees of conformance, the dictionaries will be able to get usable even in their early versions, which will be then subject to supervised improvement as user feedback accumulates. The project is F...
متن کاملWord-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data
Codeswitching is a very common behavior among Swahili speakers, but of the little computational work done on Swahili, none has focused on codeswitching. This paper addresses two tasks relating to Swahili-English codeswitching: word-level language identification and prediction of codeswitch points. Our two-step model achieves high accuracy at labeling the language of words using a simple feature...
متن کامل